Overview

Dataset statistics

Number of variables42
Number of observations199522
Missing cells874
Missing cells (%)< 0.1%
Duplicate rows3229
Duplicate rows (%)1.6%
Total size in memory416.5 MiB
Average record size in memory2.1 KiB

Variable types

CAT32
NUM10

Reproduction

Analysis started2020-03-25 03:43:24.021480
Analysis finished2020-03-25 03:48:39.512787
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
Dataset has 3229 (1.6%) duplicate rows Duplicates
state of previous residence has a high cardinality: 51 distinct values High cardinality
state of previous residence is highly correlated with region of previous residenceHigh Correlation
region of previous residence is highly correlated with state of previous residenceHigh Correlation
detailed household summary in household is highly correlated with detailed household and family statHigh Correlation
detailed household and family stat is highly correlated with detailed household summary in householdHigh Correlation
live in this house 1 year ago is highly correlated with migration code-change in msa and 4 other fieldsHigh Correlation
migration code-change in msa is highly correlated with live in this house 1 year ago and 1 other fieldsHigh Correlation
migration code-change in reg is highly correlated with live in this house 1 year ago and 1 other fieldsHigh Correlation
migration code-move within reg is highly correlated with live in this house 1 year ago and 1 other fieldsHigh Correlation
migration prev res in sunbelt is highly correlated with live in this house 1 year ago and 1 other fieldsHigh Correlation
year is highly correlated with migration code-change in msa and 4 other fieldsHigh Correlation
dividends from stocks is highly skewed (γ1 = 27.78643274) Skewed
age has 2839 (1.4%) zeros Zeros
detailed industry recode has 100683 (50.5%) zeros Zeros
detailed occupation recode has 100683 (50.5%) zeros Zeros
wage per hour has 188218 (94.3%) zeros Zeros
capital gains has 192143 (96.3%) zeros Zeros
capital losses has 195616 (98.0%) zeros Zeros
dividends from stocks has 178381 (89.4%) zeros Zeros
num persons worked for employer has 95982 (48.1%) zeros Zeros
weeks worked in year has 95982 (48.1%) zeros Zeros

Variables

age
Real number (ℝ≥0)

ZEROS
Distinct count91
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean34.494005673559805
Minimum0
Maximum90
Zeros2839
Zeros (%)1.4%
Memory size1.5 MiB

Quantile statistics

Minimum0
5-th percentile3
Q115
median33
Q350
95-th percentile75
Maximum90
Range90
Interquartile range (IQR)35

Descriptive statistics

Standard deviation22.31078458
Coefficient of variation (CV)0.6468017889
Kurtosis-0.7327995225
Mean34.49400567
Median Absolute Deviation (MAD)18.53511757
Skewness0.3732980656
Sum6882313
Variance497.7711085
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 8.5 14.5 17.5 ... 84.5 85.5 87.5 89.5 90. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
34 3489 1.7%
 
35 3450 1.7%
 
36 3353 1.7%
 
31 3351 1.7%
 
33 3340 1.7%
 
5 3332 1.7%
 
4 3318 1.7%
 
3 3279 1.6%
 
37 3278 1.6%
 
38 3277 1.6%
 
Other values (81) 166055 83.2%
 
ValueCountFrequency (%) 
0 2839 1.4%
 
1 3138 1.6%
 
2 3236 1.6%
 
3 3279 1.6%
 
4 3318 1.7%
 
ValueCountFrequency (%) 
90 725 0.4%
 
89 195 0.1%
 
88 241 0.1%
 
87 301 0.2%
 
86 348 0.2%
 

class of worker
Categorical

Distinct count9
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
100244
Private
72028
Self-employed-not incorporated
 
8445
Local government
 
7784
State government
 
4227
Other values (4)
 
6794
ValueCountFrequency (%) 
Not in universe 100244 50.2%
 
Private 72028 36.1%
 
Self-employed-not incorporated 8445 4.2%
 
Local government 7784 3.9%
 
State government 4227 2.1%
 
Self-employed-incorporated 3265 1.6%
 
Federal government 2925 1.5%
 
Never worked 439 0.2%
 
Without pay 165 0.1%
 

Length

Max length30
Mean length13.02114554
Min length7
ValueCountFrequency (%) 
Lowercase_Letter 21 72.4%
 
Uppercase_Letter 6 20.7%
 
Space_Separator 1 3.4%
 
Dash_Punctuation 1 3.4%
 
ValueCountFrequency (%) 
Latin 27 93.1%
 
Common 2 6.9%
 
ValueCountFrequency (%) 
ASCII 29 100.0%
 

detailed industry recode
Real number (ℝ≥0)

ZEROS
Distinct count52
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.352397229378214
Minimum0
Maximum51
Zeros100683
Zeros (%)50.5%
Memory size1.5 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q333
95-th percentile44
Maximum51
Range51
Interquartile range (IQR)33

Descriptive statistics

Standard deviation18.06714138
Coefficient of variation (CV)1.176828681
Kurtosis-1.501116
Mean15.35239723
Median Absolute Deviation (MAD)17.05560526
Skewness0.5166794876
Sum3063141
Variance326.4215977
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 46.5 47.5 49.5 50.5 51. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 100683 50.5%
 
33 17070 8.6%
 
43 8283 4.2%
 
4 5984 3.0%
 
42 4683 2.3%
 
45 4482 2.2%
 
29 4209 2.1%
 
37 4022 2.0%
 
41 3964 2.0%
 
32 3596 1.8%
 
Other values (42) 42546 21.3%
 
ValueCountFrequency (%) 
0 100683 50.5%
 
1 827 0.4%
 
2 2196 1.1%
 
3 563 0.3%
 
4 5984 3.0%
 
ValueCountFrequency (%) 
51 36 < 0.1%
 
50 1704 0.9%
 
49 610 0.3%
 
48 652 0.3%
 
47 1644 0.8%
 

detailed occupation recode
Real number (ℝ≥0)

ZEROS
Distinct count47
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.306612804603
Minimum0
Maximum46
Zeros100683
Zeros (%)50.5%
Memory size1.5 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q326
95-th percentile38
Maximum46
Range46
Interquartile range (IQR)26

Descriptive statistics

Standard deviation14.45421797
Coefficient of variation (CV)1.278386217
Kurtosis-0.8965458871
Mean11.3066128
Median Absolute Deviation (MAD)12.89770471
Skewness0.8292305119
Sum2255918
Variance208.9244173
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 42.5 43.5 44.5 45.5 46. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 100683 50.5%
 
2 8756 4.4%
 
26 7887 4.0%
 
19 5413 2.7%
 
29 5105 2.6%
 
36 4145 2.1%
 
34 4025 2.0%
 
10 3683 1.8%
 
16 3445 1.7%
 
23 3392 1.7%
 
Other values (37) 52988 26.6%
 
ValueCountFrequency (%) 
0 100683 50.5%
 
1 544 0.3%
 
2 8756 4.4%
 
3 3195 1.6%
 
4 1364 0.7%
 
ValueCountFrequency (%) 
46 36 < 0.1%
 
45 172 0.1%
 
44 1592 0.8%
 
43 1382 0.7%
 
42 1918 1.0%
 

education
Categorical

Distinct count17
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
High school graduate
48406
Children
47422
Some college but no degree
27820
Bachelors degree(BA AB BS)
19865
7th and 8th grade
 
8007
Other values (12)
48002
ValueCountFrequency (%) 
High school graduate 48406 24.3%
 
Children 47422 23.8%
 
Some college but no degree 27820 13.9%
 
Bachelors degree(BA AB BS) 19865 10.0%
 
7th and 8th grade 8007 4.0%
 
10th grade 7557 3.8%
 
11th grade 6876 3.4%
 
Masters degree(MA MS MEng MEd MSW MBA) 6541 3.3%
 
9th grade 6230 3.1%
 
Associates degree-occup /vocational 5358 2.7%
 
Other values (7) 15440 7.7%
 

Length

Max length38
Mean length18.86397991
Min length8
ValueCountFrequency (%) 
Lowercase_Letter 19 40.4%
 
Uppercase_Letter 13 27.7%
 
Decimal_Number 10 21.3%
 
Space_Separator 1 2.1%
 
Open_Punctuation 1 2.1%
 
Close_Punctuation 1 2.1%
 
Other_Punctuation 1 2.1%
 
Dash_Punctuation 1 2.1%
 
ValueCountFrequency (%) 
Latin 32 68.1%
 
Common 15 31.9%
 
ValueCountFrequency (%) 
ASCII 47 100.0%
 

wage per hour
Real number (ℝ≥0)

ZEROS
Distinct count1240
Unique (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean55.427185974479
Minimum0
Maximum9999
Zeros188218
Zeros (%)94.3%
Memory size1.5 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile495
Maximum9999
Range9999
Interquartile range (IQR)0

Descriptive statistics

Standard deviation274.8971148
Coefficient of variation (CV)4.959607997
Kurtosis155.2181323
Mean55.42718597
Median Absolute Deviation (MAD)104.5742276
Skewness8.935073881
Sum11058943
Variance75568.42372
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 10. 195. 202.5 212.5 ... 2506. 2812.5 3325. 5512.5 9999. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 188218 94.3%
 
500 734 0.4%
 
600 546 0.3%
 
700 534 0.3%
 
800 507 0.3%
 
1000 386 0.2%
 
425 376 0.2%
 
900 336 0.2%
 
550 280 0.1%
 
1200 256 0.1%
 
Other values (1230) 7349 3.7%
 
ValueCountFrequency (%) 
0 188218 94.3%
 
20 1 < 0.1%
 
70 1 < 0.1%
 
75 2 < 0.1%
 
100 11 < 0.1%
 
ValueCountFrequency (%) 
9999 1 < 0.1%
 
9916 1 < 0.1%
 
9800 2 < 0.1%
 
9400 2 < 0.1%
 
9000 1 < 0.1%
 
Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
186942
High school
 
6892
College or university
 
5688
ValueCountFrequency (%) 
Not in universe 186942 93.7%
 
High school 6892 3.5%
 
College or university 5688 2.9%
 

Length

Max length21
Mean length15.03287858
Min length11
ValueCountFrequency (%) 
Lowercase_Letter 14 77.8%
 
Uppercase_Letter 3 16.7%
 
Space_Separator 1 5.6%
 
ValueCountFrequency (%) 
Latin 17 94.4%
 
Common 1 5.6%
 
ValueCountFrequency (%) 
ASCII 18 100.0%
 

marital stat
Categorical

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Never married
86485
Married-civilian spouse present
84222
Divorced
 
12710
Widowed
 
10462
Separated
 
3460
Other values (2)
 
2183
ValueCountFrequency (%) 
Never married 86485 43.3%
 
Married-civilian spouse present 84222 42.2%
 
Divorced 12710 6.4%
 
Widowed 10462 5.2%
 
Separated 3460 1.7%
 
Married-spouse absent 1518 0.8%
 
Married-A F spouse present 665 0.3%
 

Length

Max length31
Mean length19.99984463
Min length7
ValueCountFrequency (%) 
Lowercase_Letter 17 65.4%
 
Uppercase_Letter 7 26.9%
 
Space_Separator 1 3.8%
 
Dash_Punctuation 1 3.8%
 
ValueCountFrequency (%) 
Latin 24 92.3%
 
Common 2 7.7%
 
ValueCountFrequency (%) 
ASCII 26 100.0%
 
Distinct count24
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe or children
100683
Retail trade
 
17070
Manufacturing-durable goods
 
9015
Education
 
8283
Manufacturing-nondurable goods
 
6897
Other values (19)
57574
ValueCountFrequency (%) 
Not in universe or children 100683 50.5%
 
Retail trade 17070 8.6%
 
Manufacturing-durable goods 9015 4.5%
 
Education 8283 4.2%
 
Manufacturing-nondurable goods 6897 3.5%
 
Finance insurance and real estate 6145 3.1%
 
Construction 5984 3.0%
 
Business and repair services 5651 2.8%
 
Medical except hospital 4683 2.3%
 
Public administration 4610 2.3%
 
Other values (14) 30501 15.3%
 

Length

Max length35
Mean length23.39613175
Min length6
ValueCountFrequency (%) 
Lowercase_Letter 21 55.3%
 
Uppercase_Letter 15 39.5%
 
Space_Separator 1 2.6%
 
Dash_Punctuation 1 2.6%
 
ValueCountFrequency (%) 
Latin 36 94.7%
 
Common 2 5.3%
 
ValueCountFrequency (%) 
ASCII 38 100.0%
 
Distinct count15
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
100683
Adm support including clerical
 
14837
Professional specialty
 
13940
Executive admin and managerial
 
12495
Other service
 
12099
Other values (10)
45468
ValueCountFrequency (%) 
Not in universe 100683 50.5%
 
Adm support including clerical 14837 7.4%
 
Professional specialty 13940 7.0%
 
Executive admin and managerial 12495 6.3%
 
Other service 12099 6.1%
 
Sales 11783 5.9%
 
Precision production craft & repair 10518 5.3%
 
Machine operators assmblrs & inspctrs 6379 3.2%
 
Handlers equip cleaners etc 4127 2.1%
 
Transportation and material moving 4020 2.0%
 
Other values (5) 8641 4.3%
 

Length

Max length37
Mean length19.76420144
Min length5
ValueCountFrequency (%) 
Lowercase_Letter 22 64.7%
 
Uppercase_Letter 10 29.4%
 
Space_Separator 1 2.9%
 
Other_Punctuation 1 2.9%
 
ValueCountFrequency (%) 
Latin 32 94.1%
 
Common 2 5.9%
 
ValueCountFrequency (%) 
ASCII 34 100.0%
 

race
Categorical

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
White
167364
Black
 
20415
Asian or Pacific Islander
 
5835
Other
 
3657
Amer Indian Aleut or Eskimo
 
2251
ValueCountFrequency (%) 
White 167364 83.9%
 
Black 20415 10.2%
 
Asian or Pacific Islander 5835 2.9%
 
Other 3657 1.8%
 
Amer Indian Aleut or Eskimo 2251 1.1%
 

Length

Max length27
Mean length5.833101112
Min length5
ValueCountFrequency (%) 
Lowercase_Letter 16 66.7%
 
Uppercase_Letter 7 29.2%
 
Space_Separator 1 4.2%
 
ValueCountFrequency (%) 
Latin 23 95.8%
 
Common 1 4.2%
 
ValueCountFrequency (%) 
ASCII 24 100.0%
 

hispanic origin
Categorical

Distinct count9
Unique (%)< 0.1%
Missing874
Missing (%)0.4%
Memory size1.5 MiB
All other
171906
Mexican-American
 
8079
Mexican (Mexicano)
 
7234
Central or South American
 
3895
Puerto Rican
 
3313
Other values (4)
 
4221
ValueCountFrequency (%) 
All other 171906 86.2%
 
Mexican-American 8079 4.0%
 
Mexican (Mexicano) 7234 3.6%
 
Central or South American 3895 2.0%
 
Puerto Rican 3313 1.7%
 
Other Spanish 2485 1.2%
 
Cuban 1126 0.6%
 
Do not know 306 0.2%
 
Chicano 304 0.2%
 
(Missing) 874 0.4%
 

Length

Max length25
Mean length9.97289522
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 18 60.0%
 
Uppercase_Letter 8 26.7%
 
Space_Separator 1 3.3%
 
Open_Punctuation 1 3.3%
 
Close_Punctuation 1 3.3%
 
Dash_Punctuation 1 3.3%
 
ValueCountFrequency (%) 
Latin 26 86.7%
 
Common 4 13.3%
 
ValueCountFrequency (%) 
ASCII 30 100.0%
 

sex
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Female
103983
Male
95539
ValueCountFrequency (%) 
Female 103983 52.1%
 
Male 95539 47.9%
 

Length

Max length6
Mean length5.042321148
Min length4
ValueCountFrequency (%) 
Lowercase_Letter 4 66.7%
 
Uppercase_Letter 2 33.3%
 
ValueCountFrequency (%) 
Latin 6 100.0%
 
ValueCountFrequency (%) 
ASCII 6 100.0%
 
Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
180458
No
 
16034
Yes
 
3030
ValueCountFrequency (%) 
Not in universe 180458 90.4%
 
No 16034 8.0%
 
Yes 3030 1.5%
 

Length

Max length15
Mean length13.77305761
Min length2
ValueCountFrequency (%) 
Lowercase_Letter 9 75.0%
 
Uppercase_Letter 2 16.7%
 
Space_Separator 1 8.3%
 
ValueCountFrequency (%) 
Latin 11 91.7%
 
Common 1 8.3%
 
ValueCountFrequency (%) 
ASCII 12 100.0%
 
Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
193452
Other job loser
 
2038
Re-entrant
 
2019
Job loser - on layoff
 
976
Job leaver
 
598
ValueCountFrequency (%) 
Not in universe 193452 97.0%
 
Other job loser 2038 1.0%
 
Re-entrant 2019 1.0%
 
Job loser - on layoff 976 0.5%
 
Job leaver 598 0.3%
 
New entrant 439 0.2%
 

Length

Max length21
Mean length14.95496737
Min length10
ValueCountFrequency (%) 
Lowercase_Letter 17 73.9%
 
Uppercase_Letter 4 17.4%
 
Space_Separator 1 4.3%
 
Dash_Punctuation 1 4.3%
 
ValueCountFrequency (%) 
Latin 21 91.3%
 
Common 2 8.7%
 
ValueCountFrequency (%) 
ASCII 23 100.0%
 
Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Children or Armed Forces
123769
Full-time schedules
40736
Not in labor force
26807
PT for non-econ reasons usually FT
 
3322
Unemployed full-time
 
2311
Other values (3)
 
2577
ValueCountFrequency (%) 
Children or Armed Forces 123769 62.0%
 
Full-time schedules 40736 20.4%
 
Not in labor force 26807 13.4%
 
PT for non-econ reasons usually FT 3322 1.7%
 
Unemployed full-time 2311 1.2%
 
PT for econ reasons usually PT 1209 0.6%
 
Unemployed part- time 843 0.4%
 
PT for econ reasons usually FT 525 0.3%
 

Length

Max length34
Mean length22.33266006
Min length18
ValueCountFrequency (%) 
Lowercase_Letter 18 66.7%
 
Uppercase_Letter 7 25.9%
 
Space_Separator 1 3.7%
 
Dash_Punctuation 1 3.7%
 
ValueCountFrequency (%) 
Latin 25 92.6%
 
Common 2 7.4%
 
ValueCountFrequency (%) 
ASCII 27 100.0%
 

capital gains
Real number (ℝ≥0)

ZEROS
Distinct count132
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean434.72116859293715
Minimum0
Maximum99999
Zeros192143
Zeros (%)96.3%
Memory size1.5 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum99999
Range99999
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4697.542951
Coefficient of variation (CV)10.80587579
Kurtosis393.0608462
Mean434.7211686
Median Absolute Deviation (MAD)837.3339304
Skewness18.99077459
Sum86736437
Variance22066909.78
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.00000e+00 5.70000e+01 4.97500e+02 7.54000e+02 9.52500e+02 ... 2.35820e+04 3.09615e+04 3.77025e+04 7.06545e+04 9.99990e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 192143 96.3%
 
15024 788 0.4%
 
7688 609 0.3%
 
7298 582 0.3%
 
99999 390 0.2%
 
3103 237 0.1%
 
5178 207 0.1%
 
5013 158 0.1%
 
4386 151 0.1%
 
3325 121 0.1%
 
Other values (122) 4136 2.1%
 
ValueCountFrequency (%) 
0 192143 96.3%
 
114 11 < 0.1%
 
401 33 < 0.1%
 
594 88 < 0.1%
 
914 17 < 0.1%
 
ValueCountFrequency (%) 
99999 390 0.2%
 
41310 2 < 0.1%
 
34095 11 < 0.1%
 
27828 94 < 0.1%
 
25236 23 < 0.1%
 

capital losses
Real number (ℝ≥0)

ZEROS
Distinct count113
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37.313975401208886
Minimum0
Maximum4608
Zeros195616
Zeros (%)98.0%
Memory size1.5 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum4608
Range4608
Interquartile range (IQR)0

Descriptive statistics

Standard deviation271.8970969
Coefficient of variation (CV)7.286736242
Kurtosis61.63260032
Mean37.3139754
Median Absolute Deviation (MAD)73.16697519
Skewness7.632544603
Sum7444959
Variance73928.03131
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 77.5 184. 639. 1198. ... 2713. 2914. 3835. 4128. 4608. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 195616 98.0%
 
1902 407 0.2%
 
1977 381 0.2%
 
1887 364 0.2%
 
1602 193 0.1%
 
2415 122 0.1%
 
1485 95 < 0.1%
 
1848 88 < 0.1%
 
1876 87 < 0.1%
 
1672 85 < 0.1%
 
Other values (103) 2084 1.0%
 
ValueCountFrequency (%) 
0 195616 98.0%
 
155 1 < 0.1%
 
213 10 < 0.1%
 
323 10 < 0.1%
 
419 29 < 0.1%
 
ValueCountFrequency (%) 
4608 4 < 0.1%
 
4356 30 < 0.1%
 
3900 2 < 0.1%
 
3770 5 < 0.1%
 
3683 4 < 0.1%
 

dividends from stocks
Real number (ℝ≥0)

SKEWED
ZEROS
Distinct count1478
Unique (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean197.53052294985014
Minimum0
Maximum99999
Zeros178381
Zeros (%)89.4%
Memory size1.5 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile400
Maximum99999
Range99999
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1984.168581
Coefficient of variation (CV)10.0448708
Kurtosis1090.558326
Mean197.5305229
Median Absolute Deviation (MAD)364.2574674
Skewness27.78643274
Sum39411685
Variance3936924.959
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.00000e+00 5.00000e-01 1.50000e+00 2.50000e+00 3.50000e+00 ... 4.91825e+04 4.99995e+04 5.00550e+04 9.75470e+04 9.99990e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 178381 89.4%
 
100 1148 0.6%
 
500 1030 0.5%
 
1000 894 0.4%
 
200 866 0.4%
 
50 832 0.4%
 
2000 574 0.3%
 
250 555 0.3%
 
150 549 0.3%
 
300 523 0.3%
 
Other values (1468) 14170 7.1%
 
ValueCountFrequency (%) 
0 178381 89.4%
 
1 472 0.2%
 
2 193 0.1%
 
3 129 0.1%
 
4 75 < 0.1%
 
ValueCountFrequency (%) 
99999 25 < 0.1%
 
95095 1 < 0.1%
 
75000 5 < 0.1%
 
70000 3 < 0.1%
 
66621 2 < 0.1%
 

tax filer stat
Categorical

Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Nonfiler
75093
Joint both under 65
67383
Single
37421
Joint both 65+
 
8332
Head of household
 
7426
ValueCountFrequency (%) 
Nonfiler 75093 37.6%
 
Joint both under 65 67383 33.8%
 
Single 37421 18.8%
 
Joint both 65+ 8332 4.2%
 
Head of household 7426 3.7%
 
Joint one under 65 & one 65+ 3867 1.9%
 

Length

Max length28
Mean length12.31299305
Min length6
ValueCountFrequency (%) 
Lowercase_Letter 15 62.5%
 
Uppercase_Letter 4 16.7%
 
Decimal_Number 2 8.3%
 
Space_Separator 1 4.2%
 
Other_Punctuation 1 4.2%
 
Math_Symbol 1 4.2%
 
ValueCountFrequency (%) 
Latin 19 79.2%
 
Common 5 20.8%
 
ValueCountFrequency (%) 
ASCII 24 100.0%
 

region of previous residence
Categorical

HIGH CORRELATION
Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
183749
South
 
4889
West
 
4074
Midwest
 
3575
Northeast
 
2705
ValueCountFrequency (%) 
Not in universe 183749 92.1%
 
South 4889 2.5%
 
West 4074 2.0%
 
Midwest 3575 1.8%
 
Northeast 2705 1.4%
 
Abroad 530 0.3%
 

Length

Max length15
Mean length14.28176341
Min length4
ValueCountFrequency (%) 
Lowercase_Letter 14 70.0%
 
Uppercase_Letter 5 25.0%
 
Space_Separator 1 5.0%
 
ValueCountFrequency (%) 
Latin 19 95.0%
 
Common 1 5.0%
 
ValueCountFrequency (%) 
ASCII 20 100.0%
 

state of previous residence
Categorical

HIGH CARDINALITY
HIGH CORRELATION
Distinct count51
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
183749
California
 
1714
Utah
 
1063
Florida
 
849
North Carolina
 
812
Other values (46)
 
11335
ValueCountFrequency (%) 
Not in universe 183749 92.1%
 
California 1714 0.9%
 
Utah 1063 0.5%
 
Florida 849 0.4%
 
North Carolina 812 0.4%
 
? 708 0.4%
 
Abroad 671 0.3%
 
Oklahoma 626 0.3%
 
Minnesota 576 0.3%
 
Indiana 533 0.3%
 
Other values (41) 8221 4.1%
 

Length

Max length20
Mean length14.45687192
Min length1
ValueCountFrequency (%) 
Lowercase_Letter 24 52.2%
 
Uppercase_Letter 20 43.5%
 
Space_Separator 1 2.2%
 
Other_Punctuation 1 2.2%
 
ValueCountFrequency (%) 
Latin 44 95.7%
 
Common 2 4.3%
 
ValueCountFrequency (%) 
ASCII 46 100.0%
 

detailed household and family stat
Categorical

HIGH CORRELATION
Distinct count38
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Householder
53248
Child <18 never marr not in subfamily
50326
Spouse of householder
41695
Nonfamily householder
22213
Child 18+ never marr Not in a subfamily
12030
Other values (33)
20010
ValueCountFrequency (%) 
Householder 53248 26.7%
 
Child <18 never marr not in subfamily 50326 25.2%
 
Spouse of householder 41695 20.9%
 
Nonfamily householder 22213 11.1%
 
Child 18+ never marr Not in a subfamily 12030 6.0%
 
Secondary individual 6122 3.1%
 
Other Rel 18+ ever marr not in subfamily 1955 1.0%
 
Grandchild <18 never marr child of subfamily RP 1868 0.9%
 
Other Rel 18+ never marr not in subfamily 1728 0.9%
 
Grandchild <18 never marr not in subfamily 1066 0.5%
 
Other values (28) 7271 3.6%
 

Length

Max length47
Mean length24.71381101
Min length11
ValueCountFrequency (%) 
Lowercase_Letter 21 60.0%
 
Uppercase_Letter 9 25.7%
 
Decimal_Number 2 5.7%
 
Math_Symbol 2 5.7%
 
Space_Separator 1 2.9%
 
ValueCountFrequency (%) 
Latin 30 85.7%
 
Common 5 14.3%
 
ValueCountFrequency (%) 
ASCII 35 100.0%
 

detailed household summary in household
Categorical

HIGH CORRELATION
Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Householder
75475
Child under 18 never married
50426
Spouse of householder
41709
Child 18 or older
 
14430
Other relative of householder
 
9702
Other values (3)
 
7780
ValueCountFrequency (%) 
Householder 75475 37.8%
 
Child under 18 never married 50426 25.3%
 
Spouse of householder 41709 20.9%
 
Child 18 or older 14430 7.2%
 
Other relative of householder 9702 4.9%
 
Nonrelative of householder 7601 3.8%
 
Group Quarters- Secondary individual 132 0.1%
 
Child under 18 ever married 47 < 0.1%
 

Length

Max length36
Mean length19.28788304
Min length11
ValueCountFrequency (%) 
Lowercase_Letter 18 62.1%
 
Uppercase_Letter 7 24.1%
 
Decimal_Number 2 6.9%
 
Space_Separator 1 3.4%
 
Dash_Punctuation 1 3.4%
 
ValueCountFrequency (%) 
Latin 25 86.2%
 
Common 4 13.8%
 
ValueCountFrequency (%) 
ASCII 29 100.0%
 

instance weight
Real number (ℝ≥0)

Distinct count99800
Unique (%)50.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1740.380471226231
Minimum37.87
Maximum18656.3
Zeros0
Zeros (%)0.0%
Memory size1.5 MiB

Quantile statistics

Minimum37.87
5-th percentile395.341
Q11061.6075
median1618.31
Q32188.61
95-th percentile3585.9095
Maximum18656.3
Range18618.43
Interquartile range (IQR)1127.0025

Descriptive statistics

Standard deviation993.7706421
Coefficient of variation (CV)0.5710076954
Kurtosis5.412470848
Mean1740.380471
Median Absolute Deviation (MAD)741.3907709
Skewness1.43272897
Sum347244192.4
Variance987580.0891
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 37.87 92.5 116.535 148.965 182.24 ... 6424.78 7236.485 9270.445 12071.45 18656.3 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1191.21 32 < 0.1%
 
1601.4 32 < 0.1%
 
1787.34 32 < 0.1%
 
753.23 32 < 0.1%
 
1317.51 31 < 0.1%
 
707.9 31 < 0.1%
 
1070.15 30 < 0.1%
 
1839.19 28 < 0.1%
 
1002.02 28 < 0.1%
 
1009.39 28 < 0.1%
 
Other values (99790) 199218 99.8%
 
ValueCountFrequency (%) 
37.87 1 < 0.1%
 
39.11 1 < 0.1%
 
40.67 2 < 0.1%
 
42.82 2 < 0.1%
 
43.26 3 < 0.1%
 
ValueCountFrequency (%) 
18656.3 1 < 0.1%
 
16349.2 1 < 0.1%
 
13911.5 1 < 0.1%
 
13145.1 1 < 0.1%
 
13114.2 1 < 0.1%
 

migration code-change in msa
Categorical

HIGH CORRELATION
Distinct count10
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
?
99695
Nonmover
82538
MSA to MSA
 
10601
NonMSA to nonMSA
 
2811
Not in universe
 
1516
Other values (5)
 
2361
ValueCountFrequency (%) 
? 99695 50.0%
 
Nonmover 82538 41.4%
 
MSA to MSA 10601 5.3%
 
NonMSA to nonMSA 2811 1.4%
 
Not in universe 1516 0.8%
 
MSA to nonMSA 790 0.4%
 
NonMSA to MSA 615 0.3%
 
Abroad to MSA 453 0.2%
 
Not identifiable 430 0.2%
 
Abroad to nonMSA 73 < 0.1%
 

Length

Max length16
Mean length4.841205481
Min length1
ValueCountFrequency (%) 
Lowercase_Letter 15 71.4%
 
Uppercase_Letter 4 19.0%
 
Space_Separator 1 4.8%
 
Other_Punctuation 1 4.8%
 
ValueCountFrequency (%) 
Latin 19 90.5%
 
Common 2 9.5%
 
ValueCountFrequency (%) 
ASCII 21 100.0%
 

migration code-change in reg
Categorical

HIGH CORRELATION
Distinct count9
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
?
99695
Nonmover
82538
Same county
 
9812
Different county same state
 
2797
Not in universe
 
1516
Other values (4)
 
3164
ValueCountFrequency (%) 
? 99695 50.0%
 
Nonmover 82538 41.4%
 
Same county 9812 4.9%
 
Different county same state 2797 1.4%
 
Not in universe 1516 0.8%
 
Different region 1178 0.6%
 
Different state same division 991 0.5%
 
Abroad 530 0.3%
 
Different division same region 465 0.2%
 

Length

Max length30
Mean length5.166883852
Min length1
ValueCountFrequency (%) 
Lowercase_Letter 17 73.9%
 
Uppercase_Letter 4 17.4%
 
Space_Separator 1 4.3%
 
Other_Punctuation 1 4.3%
 
ValueCountFrequency (%) 
Latin 21 91.3%
 
Common 2 8.7%
 
ValueCountFrequency (%) 
ASCII 23 100.0%
 

migration code-move within reg
Categorical

HIGH CORRELATION
Distinct count10
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
?
99695
Nonmover
82538
Same county
 
9812
Different county same state
 
2797
Not in universe
 
1516
Other values (5)
 
3164
ValueCountFrequency (%) 
? 99695 50.0%
 
Nonmover 82538 41.4%
 
Same county 9812 4.9%
 
Different county same state 2797 1.4%
 
Not in universe 1516 0.8%
 
Different state in South 973 0.5%
 
Different state in West 679 0.3%
 
Different state in Midwest 551 0.3%
 
Abroad 530 0.3%
 
Different state in Northeast 431 0.2%
 

Length

Max length28
Mean length5.186059683
Min length1
ValueCountFrequency (%) 
Lowercase_Letter 18 69.2%
 
Uppercase_Letter 6 23.1%
 
Space_Separator 1 3.8%
 
Other_Punctuation 1 3.8%
 
ValueCountFrequency (%) 
Latin 24 92.3%
 
Common 2 7.7%
 
ValueCountFrequency (%) 
ASCII 26 100.0%
 

live in this house 1 year ago
Categorical

HIGH CORRELATION
Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe under 1 year old
101211
Yes
82538
No
 
15773
ValueCountFrequency (%) 
Not in universe under 1 year old 101211 50.7%
 
Yes 82538 41.4%
 
No 15773 7.9%
 

Length

Max length32
Mean length17.63169976
Min length2
ValueCountFrequency (%) 
Lowercase_Letter 13 76.5%
 
Uppercase_Letter 2 11.8%
 
Space_Separator 1 5.9%
 
Decimal_Number 1 5.9%
 
ValueCountFrequency (%) 
Latin 15 88.2%
 
Common 2 11.8%
 
ValueCountFrequency (%) 
ASCII 17 100.0%
 

migration prev res in sunbelt
Categorical

HIGH CORRELATION
Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
?
99695
Not in universe
84054
No
 
9987
Yes
 
5786
ValueCountFrequency (%) 
? 99695 50.0%
 
Not in universe 84054 42.1%
 
No 9987 5.0%
 
Yes 5786 2.9%
 

Length

Max length15
Mean length7.005929171
Min length1
ValueCountFrequency (%) 
Lowercase_Letter 9 69.2%
 
Uppercase_Letter 2 15.4%
 
Space_Separator 1 7.7%
 
Other_Punctuation 1 7.7%
 
ValueCountFrequency (%) 
Latin 11 84.6%
 
Common 2 15.4%
 
ValueCountFrequency (%) 
ASCII 13 100.0%
 

num persons worked for employer
Real number (ℝ≥0)

ZEROS
Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.956190294804583
Minimum0
Maximum6
Zeros95982
Zeros (%)48.1%
Memory size1.5 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q34
95-th percentile6
Maximum6
Range6
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.365127378
Coefficient of variation (CV)1.209047701
Kurtosis-1.082258102
Mean1.956190295
Median Absolute Deviation (MAD)2.10358415
Skewness0.7515530618
Sum390303
Variance5.593827514
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 1.5 2.5 3.5 4.5 5.5 6. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 95982 48.1%
 
6 36511 18.3%
 
1 23109 11.6%
 
4 14379 7.2%
 
3 13425 6.7%
 
2 10081 5.1%
 
5 6035 3.0%
 
ValueCountFrequency (%) 
0 95982 48.1%
 
1 23109 11.6%
 
2 10081 5.1%
 
3 13425 6.7%
 
4 14379 7.2%
 
ValueCountFrequency (%) 
6 36511 18.3%
 
5 6035 3.0%
 
4 14379 7.2%
 
3 13425 6.7%
 
2 10081 5.1%
 
Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
144231
Both parents present
38983
Mother only present
 
12772
Father only present
 
1883
Neither parent present
 
1653
ValueCountFrequency (%) 
Not in universe 144231 72.3%
 
Both parents present 38983 19.5%
 
Mother only present 12772 6.4%
 
Father only present 1883 0.9%
 
Neither parent present 1653 0.8%
 

Length

Max length22
Mean length16.32870561
Min length15
ValueCountFrequency (%) 
Lowercase_Letter 14 73.7%
 
Uppercase_Letter 4 21.1%
 
Space_Separator 1 5.3%
 
ValueCountFrequency (%) 
Latin 18 94.7%
 
Common 1 5.3%
 
ValueCountFrequency (%) 
ASCII 19 100.0%
 
Distinct count43
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
United-States
159162
Mexico
 
10008
?
 
6713
Puerto-Rico
 
2680
Italy
 
2212
Other values (38)
 
18747
ValueCountFrequency (%) 
United-States 159162 79.8%
 
Mexico 10008 5.0%
 
? 6713 3.4%
 
Puerto-Rico 2680 1.3%
 
Italy 2212 1.1%
 
Canada 1380 0.7%
 
Germany 1356 0.7%
 
Dominican-Republic 1290 0.6%
 
Poland 1212 0.6%
 
Philippines 1154 0.6%
 
Other values (33) 12355 6.2%
 

Length

Max length28
Mean length11.66875332
Min length1
ValueCountFrequency (%) 
Lowercase_Letter 21 44.7%
 
Uppercase_Letter 20 42.6%
 
Other_Punctuation 2 4.3%
 
Space_Separator 1 2.1%
 
Open_Punctuation 1 2.1%
 
Close_Punctuation 1 2.1%
 
Dash_Punctuation 1 2.1%
 
ValueCountFrequency (%) 
Latin 41 87.2%
 
Common 6 12.8%
 
ValueCountFrequency (%) 
ASCII 47 100.0%
 
Distinct count43
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
United-States
160478
Mexico
 
9781
?
 
6119
Puerto-Rico
 
2473
Italy
 
1844
Other values (38)
 
18827
ValueCountFrequency (%) 
United-States 160478 80.4%
 
Mexico 9781 4.9%
 
? 6119 3.1%
 
Puerto-Rico 2473 1.2%
 
Italy 1844 0.9%
 
Canada 1451 0.7%
 
Germany 1382 0.7%
 
Philippines 1231 0.6%
 
Poland 1110 0.6%
 
El-Salvador 1108 0.6%
 
Other values (33) 12545 6.3%
 

Length

Max length28
Mean length11.72126382
Min length1
ValueCountFrequency (%) 
Lowercase_Letter 21 44.7%
 
Uppercase_Letter 20 42.6%
 
Other_Punctuation 2 4.3%
 
Space_Separator 1 2.1%
 
Open_Punctuation 1 2.1%
 
Close_Punctuation 1 2.1%
 
Dash_Punctuation 1 2.1%
 
ValueCountFrequency (%) 
Latin 41 87.2%
 
Common 6 12.8%
 
ValueCountFrequency (%) 
ASCII 47 100.0%
 
Distinct count43
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
United-States
176988
Mexico
 
5767
?
 
3393
Puerto-Rico
 
1400
Germany
 
851
Other values (38)
 
11123
ValueCountFrequency (%) 
United-States 176988 88.7%
 
Mexico 5767 2.9%
 
? 3393 1.7%
 
Puerto-Rico 1400 0.7%
 
Germany 851 0.4%
 
Philippines 845 0.4%
 
Cuba 837 0.4%
 
Canada 700 0.4%
 
Dominican-Republic 690 0.3%
 
El-Salvador 689 0.3%
 
Other values (33) 7362 3.7%
 

Length

Max length28
Mean length12.27975361
Min length1
ValueCountFrequency (%) 
Lowercase_Letter 21 44.7%
 
Uppercase_Letter 20 42.6%
 
Other_Punctuation 2 4.3%
 
Space_Separator 1 2.1%
 
Open_Punctuation 1 2.1%
 
Close_Punctuation 1 2.1%
 
Dash_Punctuation 1 2.1%
 
ValueCountFrequency (%) 
Latin 41 87.2%
 
Common 6 12.8%
 
ValueCountFrequency (%) 
ASCII 47 100.0%
 

citizenship
Categorical

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Native- Born in the United States
176991
Foreign born- Not a citizen of U S
 
13401
Foreign born- U S citizen by naturalization
 
5855
Native- Born abroad of American Parent(s)
 
1756
Native- Born in Puerto Rico or U S Outlying
 
1519
ValueCountFrequency (%) 
Native- Born in the United States 176991 88.7%
 
Foreign born- Not a citizen of U S 13401 6.7%
 
Foreign born- U S citizen by naturalization 5855 2.9%
 
Native- Born abroad of American Parent(s) 1756 0.9%
 
Native- Born in Puerto Rico or U S Outlying 1519 0.8%
 

Length

Max length43
Mean length33.57432263
Min length33
ValueCountFrequency (%) 
Lowercase_Letter 20 60.6%
 
Uppercase_Letter 9 27.3%
 
Space_Separator 1 3.0%
 
Dash_Punctuation 1 3.0%
 
Open_Punctuation 1 3.0%
 
Close_Punctuation 1 3.0%
 
ValueCountFrequency (%) 
Latin 29 87.9%
 
Common 4 12.1%
 
ValueCountFrequency (%) 
ASCII 33 100.0%
 
Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
0
180671
2
 
16153
1
 
2698
ValueCountFrequency (%) 
0 180671 90.6%
 
2 16153 8.1%
 
1 2698 1.4%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 3 100.0%
 
ValueCountFrequency (%) 
Common 3 100.0%
 
ValueCountFrequency (%) 
ASCII 3 100.0%
 
Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
197538
No
 
1593
Yes
 
391
ValueCountFrequency (%) 
Not in universe 197538 99.0%
 
No 1593 0.8%
 
Yes 391 0.2%
 

Length

Max length15
Mean length14.87269073
Min length2
ValueCountFrequency (%) 
Lowercase_Letter 9 75.0%
 
Uppercase_Letter 2 16.7%
 
Space_Separator 1 8.3%
 
ValueCountFrequency (%) 
Latin 11 91.7%
 
Common 1 8.3%
 
ValueCountFrequency (%) 
ASCII 12 100.0%
 
Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
2
150129
0
47409
1
 
1984
ValueCountFrequency (%) 
2 150129 75.2%
 
0 47409 23.8%
 
1 1984 1.0%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 3 100.0%
 
ValueCountFrequency (%) 
Common 3 100.0%
 
ValueCountFrequency (%) 
ASCII 3 100.0%
 

weeks worked in year
Real number (ℝ≥0)

ZEROS
Distinct count53
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.175013281743368
Minimum0
Maximum52
Zeros95982
Zeros (%)48.1%
Memory size1.5 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median8
Q352
95-th percentile52
Maximum52
Range52
Interquartile range (IQR)52

Descriptive statistics

Standard deviation24.41149421
Coefficient of variation (CV)1.053354055
Kurtosis-1.863809327
Mean23.17501328
Median Absolute Deviation (MAD)23.68275158
Skewness0.2101602532
Sum4623925
Variance595.9210495
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 3.5 4.5 5.5 ... 48.5 49.5 50.5 51.5 52. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 95982 48.1%
 
52 70314 35.2%
 
40 2790 1.4%
 
50 2304 1.2%
 
26 2268 1.1%
 
48 1806 0.9%
 
12 1780 0.9%
 
30 1378 0.7%
 
20 1330 0.7%
 
8 1126 0.6%
 
Other values (43) 18444 9.2%
 
ValueCountFrequency (%) 
0 95982 48.1%
 
1 464 0.2%
 
2 458 0.2%
 
3 417 0.2%
 
4 757 0.4%
 
ValueCountFrequency (%) 
52 70314 35.2%
 
51 819 0.4%
 
50 2304 1.2%
 
49 509 0.3%
 
48 1806 0.9%
 

year
Categorical

HIGH CORRELATION
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
94
99827
95
99695
ValueCountFrequency (%) 
94 99827 50.0%
 
95 99695 50.0%
 

Length

Max length2
Mean length2
Min length2
ValueCountFrequency (%) 
Decimal_Number 3 100.0%
 
ValueCountFrequency (%) 
Common 3 100.0%
 
ValueCountFrequency (%) 
ASCII 3 100.0%
 

label
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
- 50000.
187140
50000+.
 
12382
ValueCountFrequency (%) 
- 50000. 187140 93.8%
 
50000+. 12382 6.2%
 

Length

Max length8
Mean length7.937941681
Min length7
ValueCountFrequency (%) 
Decimal_Number 2 33.3%
 
Space_Separator 1 16.7%
 
Dash_Punctuation 1 16.7%
 
Other_Punctuation 1 16.7%
 
Math_Symbol 1 16.7%
 
ValueCountFrequency (%) 
Common 6 100.0%
 
ValueCountFrequency (%) 
ASCII 6 100.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

ageclass of workerdetailed industry recodedetailed occupation recodeeducationwage per hourenroll in edu inst last wkmarital statmajor industry codemajor occupation coderacehispanic originsexmember of a labor unionreason for unemploymentfull or part time employment statcapital gainscapital lossesdividends from stockstax filer statregion of previous residencestate of previous residencedetailed household and family statdetailed household summary in householdinstance weightmigration code-change in msamigration code-change in regmigration code-move within reglive in this house 1 year agomigration prev res in sunbeltnum persons worked for employerfamily members under 18country of birth fathercountry of birth mothercountry of birth selfcitizenshipown business or self employedfill inc questionnaire for veteran's adminveterans benefitsweeks worked in yearyearlabel
058Self-employed-not incorporated434Some college but no degree0Not in universeDivorcedConstructionPrecision production craft & repairWhiteAll otherMaleNot in universeNot in universeChildren or Armed Forces000Head of householdSouthArkansasHouseholderHouseholder1053.55MSA to MSASame countySame countyNoYes1Not in universeUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe25294- 50000.
118Not in universe0010th grade0High schoolNever marriedNot in universe or childrenNot in universeAsian or Pacific IslanderAll otherFemaleNot in universeNot in universeNot in labor force000NonfilerNot in universeNot in universeChild 18+ never marr Not in a subfamilyChild 18 or older991.95???Not in universe under 1 year old?0Not in universeVietnamVietnamVietnamForeign born- Not a citizen of U S0Not in universe2095- 50000.
29Not in universe00Children0Not in universeNever marriedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeChildren or Armed Forces000NonfilerNot in universeNot in universeChild <18 never marr not in subfamilyChild under 18 never married1758.14NonmoverNonmoverNonmoverYesNot in universe0Both parents presentUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe0094- 50000.
310Not in universe00Children0Not in universeNever marriedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeChildren or Armed Forces000NonfilerNot in universeNot in universeChild <18 never marr not in subfamilyChild under 18 never married1069.16NonmoverNonmoverNonmoverYesNot in universe0Both parents presentUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe0094- 50000.
448Private4010Some college but no degree1200Not in universeMarried-civilian spouse presentEntertainmentProfessional specialtyAmer Indian Aleut or EskimoAll otherFemaleNoNot in universeFull-time schedules000Joint both under 65Not in universeNot in universeSpouse of householderSpouse of householder162.61???Not in universe under 1 year old?1Not in universePhilippinesUnited-StatesUnited-StatesNative- Born in the United States2Not in universe25295- 50000.
542Private343Bachelors degree(BA AB BS)0Not in universeMarried-civilian spouse presentFinance insurance and real estateExecutive admin and managerialWhiteAll otherMaleNot in universeNot in universeChildren or Armed Forces517800Joint both under 65Not in universeNot in universeHouseholderHouseholder1535.86NonmoverNonmoverNonmoverYesNot in universe6Not in universeUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe25294- 50000.
628Private440High school graduate0Not in universeNever marriedConstructionHandlers equip cleaners etcWhiteAll otherFemaleNot in universeJob loser - on layoffUnemployed full-time000SingleNot in universeNot in universeSecondary individualNonrelative of householder898.83???Not in universe under 1 year old?4Not in universeUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe23095- 50000.
747Local government4326Some college but no degree876Not in universeMarried-civilian spouse presentEducationAdm support including clericalWhiteAll otherFemaleNoNot in universeFull-time schedules000Joint both under 65Not in universeNot in universeSpouse of householderSpouse of householder1661.53???Not in universe under 1 year old?5Not in universeUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe25295- 50000.
834Private437Some college but no degree0Not in universeMarried-civilian spouse presentConstructionMachine operators assmblrs & inspctrsWhiteAll otherMaleNot in universeNot in universeChildren or Armed Forces000Joint both under 65Not in universeNot in universeHouseholderHouseholder1146.79NonmoverNonmoverNonmoverYesNot in universe6Not in universeUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe25294- 50000.
98Not in universe00Children0Not in universeNever marriedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeChildren or Armed Forces000NonfilerNot in universeNot in universeChild <18 never marr not in subfamilyChild under 18 never married2466.24NonmoverNonmoverNonmoverYesNot in universe0Both parents presentUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe0094- 50000.

Last rows

ageclass of workerdetailed industry recodedetailed occupation recodeeducationwage per hourenroll in edu inst last wkmarital statmajor industry codemajor occupation coderacehispanic originsexmember of a labor unionreason for unemploymentfull or part time employment statcapital gainscapital lossesdividends from stockstax filer statregion of previous residencestate of previous residencedetailed household and family statdetailed household summary in householdinstance weightmigration code-change in msamigration code-change in regmigration code-move within reglive in this house 1 year agomigration prev res in sunbeltnum persons worked for employerfamily members under 18country of birth fathercountry of birth mothercountry of birth selfcitizenshipown business or self employedfill inc questionnaire for veteran's adminveterans benefitsweeks worked in yearyearlabel
19951257Private9379th grade0Not in universeDivorcedManufacturing-durable goodsMachine operators assmblrs & inspctrsWhiteCentral or South AmericanFemaleNot in universeNot in universeFull-time schedules000SingleNot in universeNot in universeHouseholderHouseholder743.66???Not in universe under 1 year old?4Not in universeDominican-RepublicDominican-RepublicDominican-RepublicForeign born- Not a citizen of U S0Not in universe25295- 50000.
19951351Private331910th grade0Not in universeWidowedRetail tradeSalesWhiteAll otherFemaleNot in universeNot in universeChildren or Armed Forces000SingleSouthNorth DakotaHouseholderHouseholder1302.34NonMSA to nonMSASame countySame countyNoYes6Not in universeUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe25294- 50000.
19951487Not in universe00High school graduate0Not in universeWidowedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeNot in labor force000SingleNot in universeNot in universeNonfamily householderHouseholder3255.80???Not in universe under 1 year old?0Not in universe?United-StatesUnited-StatesNative- Born in the United States0Not in universe2095- 50000.
1995153Not in universe00Children0Not in universeNever marriedNot in universe or childrenNot in universeBlackAll otherMaleNot in universeNot in universeChildren or Armed Forces000NonfilerSouthUtahChild under 18 of RP of unrel subfamilyNonrelative of householder2733.75MSA to MSASame countySame countyNoYes0Mother only presentUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe0094- 50000.
19951639Private4326Bachelors degree(BA AB BS)0Not in universeNever marriedEducationAdm support including clericalOtherMexican-AmericanMaleNoNot in universeFull-time schedules684900SingleNot in universeNot in universeNonfamily householderHouseholder908.14???Not in universe under 1 year old?6Not in universeMexicoMexicoMexicoForeign born- Not a citizen of U S2Not in universe25295- 50000.
19951787Not in universe007th and 8th grade0Not in universeMarried-civilian spouse presentNot in universe or childrenNot in universeWhiteAll otherMaleNot in universeNot in universeNot in labor force000Joint both 65+Not in universeNot in universeHouseholderHouseholder955.27???Not in universe under 1 year old?0Not in universeCanadaUnited-StatesUnited-StatesNative- Born in the United States0Not in universe2095- 50000.
19951865Self-employed-incorporated37211th grade0Not in universeMarried-civilian spouse presentBusiness and repair servicesExecutive admin and managerialWhiteAll otherMaleNot in universeNot in universeChildren or Armed Forces641809Joint one under 65 & one 65+Not in universeNot in universeHouseholderHouseholder687.19NonmoverNonmoverNonmoverYesNot in universe1Not in universeUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe25294- 50000.
19951947Not in universe00Some college but no degree0Not in universeMarried-civilian spouse presentNot in universe or childrenNot in universeWhiteAll otherMaleNot in universeNot in universeChildren or Armed Forces00157Joint both under 65Not in universeNot in universeHouseholderHouseholder1923.03???Not in universe under 1 year old?6Not in universePolandPolandGermanyForeign born- U S citizen by naturalization0Not in universe25295- 50000.
19952016Not in universe0010th grade0High schoolNever marriedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeNot in labor force000NonfilerNot in universeNot in universeChild <18 never marr not in subfamilyChild under 18 never married4664.87???Not in universe under 1 year old?0Both parents presentUnited-StatesUnited-StatesUnited-StatesNative- Born in the United States0Not in universe2095- 50000.
19952132Private4230High school graduate0Not in universeNever marriedMedical except hospitalOther serviceBlackAll otherFemaleNoNot in universeChildren or Armed Forces000SingleNot in universeNot in universeNonfamily householderHouseholder1830.11NonmoverNonmoverNonmoverYesNot in universe6Not in universe???Foreign born- Not a citizen of U S0Not in universe25294- 50000.